Picture for Kaiwen Zheng

Kaiwen Zheng

minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

Add code
May 28, 2026
Viaarxiv icon

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

Add code
May 14, 2026
Viaarxiv icon

Training-Free Test-Time Contrastive Learning for Large Language Models

Add code
Apr 15, 2026
Viaarxiv icon

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Add code
Feb 13, 2026
Viaarxiv icon

Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues

Add code
Jan 28, 2026
Viaarxiv icon

Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions

Add code
Jan 01, 2026
Viaarxiv icon

Vidarc: Embodied Video Diffusion Model for Closed-loop Control

Add code
Dec 19, 2025
Figure 1 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 2 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 3 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Figure 4 for Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Viaarxiv icon

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Add code
Dec 18, 2025
Viaarxiv icon

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Add code
Sep 19, 2025
Viaarxiv icon

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities

Add code
Aug 10, 2025
Viaarxiv icon